Skip to content

Refactor LeanAI to production-ready MLOps pipeline with modular training#55

Open
k2jac9 wants to merge 2 commits intomainfrom
claude/explore-repository-1nowi
Open

Refactor LeanAI to production-ready MLOps pipeline with modular training#55
k2jac9 wants to merge 2 commits intomainfrom
claude/explore-repository-1nowi

Conversation

@k2jac9
Copy link
Copy Markdown
Owner

@k2jac9 k2jac9 commented Apr 17, 2026

Summary

Comprehensive refactoring of the LeanAI body fat prediction project from a monolithic structure to a production-ready MLOps pipeline. The codebase has been reorganized with modular components, proper testing infrastructure, and containerization support while maintaining the core SVR model achieving <1% error (MAE: 0.10, R²: 0.9996).

Key Changes

Core ML Pipeline

  • Modularized training (modeling/train.py): Extracted feature engineering (polynomial features, RFE, PCA) into reusable functions with explicit model building and evaluation
  • Prediction module (modeling/predict.py): Separated inference logic with support for single predictions and batch processing
  • Dataset processing (dataset.py): Implemented feature engineering pipeline (BMI, waist-to-hip ratios, arm ratios) with outlier removal using z-score normalization

API & Deployment

  • Refactored FastAPI (api/main.py): Simplified endpoint structure with proper CORS configuration, health checks, and environment-based model loading
  • MLOps FastAPI wrapper (mlops/src/fastapi_app/main.py): Added MLflow model registry integration alongside local model loading
  • Docker improvements: Updated all Dockerfiles to Python 3.12, added health checks, and improved layer caching

Testing & Quality

  • Comprehensive test suite (tests/): Added pytest tests for API endpoints, dataset processing, and model training
  • CI/CD pipeline (.github/workflows/ci.yml): GitHub Actions workflow for linting, testing, and Docker builds
  • Code quality: Integrated Ruff for linting/formatting with proper configuration in pyproject.toml

MLOps Infrastructure

  • Drift detection (mlops/src/monitoring/drift_detection.py): Evidently AI integration for data, target, and regression performance monitoring
  • Retraining pipeline (mlops/src/retraining/retrain_flow.py): Metaflow-based automated retraining triggered by drift detection
  • MLflow integration (mlops/src/train_and_log_mlflow.py): Model tracking and experiment logging

Documentation & Configuration

  • Streamlined README: Condensed from 301 to 241 lines with quick-start guides, tech stack overview, and project structure
  • Updated dependencies: Pinned versions in requirements.txt and pyproject.toml with optional dependency groups (mlops, dev, viz)
  • Makefile modernization: Added targets for data processing, plotting, training, API, testing, and Docker operations
  • Cross-platform support: Updated pixi.toml for Linux, macOS, and Windows with focused dependency set

Visualization

  • EDA module (plots.py): Refactored plotting functions for distributions, correlation heatmaps, target relationships, and outlier detection

Notable Implementation Details

  • Feature engineering pipeline uses PolynomialFeatures (degree=2) → RFE (8 features) → PCA (5 components) before SVR
  • Model artifacts saved as pickle files with joblib for reproducibility
  • API supports both JSON and HTML form-based predictions with proper content negotiation
  • Docker Compose includes optional Jupyter service for development and training service for batch processing
  • Comprehensive error handling and logging throughout the pipeline

https://claude.ai/code/session_018HH3y8TvMgXG3PEmdEjoRS

claude added 2 commits April 7, 2026 01:49
- Replace placeholder stubs with real implementations (modeling/train.py,
  modeling/predict.py, dataset.py, plots.py, drift_detection.py, retrain_flow.py)
- Fix API security: remove debug print, tighten CORS, add /health endpoint
- Upgrade Dockerfiles from Python 3.9 to 3.12 with multi-service compose
- Add pytest test suite (test_api.py, test_dataset.py, test_modeling.py)
- Add GitHub Actions CI/CD (lint, test, docker build)
- Consolidate dependencies in pyproject.toml with optional groups
- Fix pixi.toml for cross-platform support (linux, macos, windows)
- Improve Makefile with full dev workflow commands
- Update README with architecture diagrams and quick start guide

https://claude.ai/code/session_018HH3y8TvMgXG3PEmdEjoRS
- Fix encode_sex() to handle pandas 3.0 StringDtype (not just object)
- Fix API: use modern Starlette TemplateResponse signature (request, name, context)
- Clear broken __init__.py (referenced non-existent leanai module)
- Improve test data for outlier test
- Add ruff exclusions for legacy MLflow artifacts and pre-existing scripts
- Apply ruff formatter across all new code

All 18 tests pass, ruff check clean.

https://claude.ai/code/session_018HH3y8TvMgXG3PEmdEjoRS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants